Motivation

Formations and player roles are a very fuzzy concept. We simplify things into 4-2-3-1s and 4-3-3s and right backs and centre forwards but within those simplifications there are many nuances to how different teams and different players function.

In this post, I try to quantify these and find teams who are set up similarly in various matches.

Clustering team-matches

The teams are clustered based on the max distance between players.

Each node at the end of this graph is a particular team playing in a particular match.

Cluster descriptions:

  • Cluster 1 is 3-4-2-1

  • Cluster 2 is 3-4-2-1 with a mix of various other 3 at the back formations

  • Cluster 3 is 4-3-3

  • Cluster 4 to 7 are a mix of various 4 at the back formations with a strong 4-2-3-1 element in all of them.

  • Cluster 8 seems to not have an underlying link with the formation.

Cluster Examples

A match from each of the clusters is shown below. I’ve also added some comparisons between clusters which feel slightly similar.

Cluster 1

Comparing clusters 1 and 2

Cluster 2

Cluster 3

Cluster 4

Comparing clusters 4 and 5

Comparing clusters 4 and 6

Comparing clusters 4 and 7

Cluster 5

Comparing clusters 5 and 6

Comparing clusters 5 and 7

Cluster 6

Comparing clusters 6 and 7

Cluster 7

Cluster 8

To Do

I plan to add data from La Liga, Bundesliga, Seria A, Ligue 1, and the Championship from 2017/18. A bigger set of matches with more diverse strategies might lead to different, and more general clusters?